7 research outputs found
Multi-Modal Dataset Acquisition for Photometrically Challenging Object
This paper addresses the limitations of current datasets for 3D vision tasks
in terms of accuracy, size, realism, and suitable imaging modalities for
photometrically challenging objects. We propose a novel annotation and
acquisition pipeline that enhances existing 3D perception and 6D object pose
datasets. Our approach integrates robotic forward-kinematics, external infrared
trackers, and improved calibration and annotation procedures. We present a
multi-modal sensor rig, mounted on a robotic end-effector, and demonstrate how
it is integrated into the creation of highly accurate datasets. Additionally,
we introduce a freehand procedure for wider viewpoint coverage. Both approaches
yield high-quality 3D data with accurate object and camera pose annotations.
Our methods overcome the limitations of existing datasets and provide valuable
resources for 3D vision research.Comment: Accepted at ICCV 2023 TRICKY Worksho
Polarimetric Information for Multi-Modal 6D Pose Estimation of Photometrically Challenging Objects with Limited Data
6D pose estimation pipelines that rely on RGB-only or RGB-D data show
limitations for photometrically challenging objects with e.g. textureless
surfaces, reflections or transparency. A supervised learning-based method
utilising complementary polarisation information as input modality is proposed
to overcome such limitations. This supervised approach is then extended to a
self-supervised paradigm by leveraging physical characteristics of polarised
light, thus eliminating the need for annotated real data. The methods achieve
significant advancements in pose estimation by leveraging geometric information
from polarised light and incorporating shape priors and invertible physical
constraints.Comment: Accepted at ICCV 2023 TRICKY Worksho
Polarimetric Pose Prediction
Light has many properties that vision sensors can passively measure.
Colour-band separated wavelength and intensity are arguably the most commonly
used for monocular 6D object pose estimation. This paper explores how
complementary polarisation information, i.e. the orientation of light wave
oscillations, influences the accuracy of pose predictions. A hybrid model that
leverages physical priors jointly with a data-driven learning strategy is
designed and carefully tested on objects with different levels of photometric
complexity. Our design significantly improves the pose accuracy compared to
state-of-the-art photometric approaches and enables object pose estimation for
highly reflective and transparent objects. A new multi-modal instance-level 6D
object pose dataset with highly accurate pose annotations for multiple objects
with varying photometric complexity is introduced as a benchmark.Comment: Accepted at ECCV 2022; 25 pages (14 main paper + References + 7
Appendix
HouseCat6D -- A Large-Scale Multi-Modal Category Level 6D Object Pose Dataset with Household Objects in Realistic Scenarios
Estimating the 6D pose of objects is a major 3D computer vision problem.
Since the promising outcomes from instance-level approaches, research heads
also move towards category-level pose estimation for more practical application
scenarios. However, unlike well-established instance-level pose datasets,
available category-level datasets lack annotation quality and provided pose
quantity. We propose the new category-level 6D pose dataset HouseCat6D
featuring 1) Multi-modality of Polarimetric RGB and Depth (RGBD+P), 2) Highly
diverse 194 objects of 10 household object categories including 2
photometrically challenging categories, 3) High-quality pose annotation with an
error range of only 1.35 mm to 1.74 mm, 4) 41 large-scale scenes with extensive
viewpoint coverage and occlusions, 5) Checkerboard-free environment throughout
the entire scene, and 6) Additionally annotated dense 6D parallel-jaw grasps.
Furthermore, we also provide benchmark results of state-of-the-art
category-level pose estimation networks
On the Importance of Accurate Geometry Data for Dense 3D Vision Tasks
Learning-based methods to solve dense 3D vision problems typically train on
3D sensor data. The respectively used principle of measuring distances provides
advantages and drawbacks. These are typically not compared nor discussed in the
literature due to a lack of multi-modal datasets. Texture-less regions are
problematic for structure from motion and stereo, reflective material poses
issues for active sensing, and distances for translucent objects are intricate
to measure with existing hardware. Training on inaccurate or corrupt data
induces model bias and hampers generalisation capabilities. These effects
remain unnoticed if the sensor measurement is considered as ground truth during
the evaluation. This paper investigates the effect of sensor errors for the
dense 3D vision tasks of depth estimation and reconstruction. We rigorously
show the significant impact of sensor characteristics on the learned
predictions and notice generalisation issues arising from various technologies
in everyday household environments. For evaluation, we introduce a carefully
designed dataset\footnote{dataset available at
https://github.com/Junggy/HAMMER-dataset} comprising measurements from
commodity sensors, namely D-ToF, I-ToF, passive/active stereo, and monocular
RGB+P. Our study quantifies the considerable sensor noise impact and paves the
way to improved dense vision estimates and targeted data fusion.Comment: Accepted at CVPR 2023, Main Paper + Supp. Mat. arXiv admin note:
substantial text overlap with arXiv:2205.0456